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H ■ Abstract 



We present a framework for studying the problem of media streaming in technology and cost 
heterogeneous environments. We first address the problem of efficient streaming in a technology- 
heterogeneous setting. We employ random linear network coding to simplify the packet selection 
strategies and alleviate issues such as dupUcate packet reception. Then, we study the problem of media 
streaming from multiple cost-heterogeneous access networks. Our objective is to characterize analytically 
the trade-off between access cost and user experience. We model the Quality of user Experience (QoE) as 
the probability of interruption in playback as well as the initial waiting time. We design and characterize 
various control policies, and formulate the optimal control problem using a Markov Decision Process 
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jn : (MDP) with a probabtHsttc constraint. We present a characterization of the optimal poHcy using the 

CO ■ Hamilton-Jacobi-Bellman (HJB) equation. For a fluid approximation model, we provide an exact and 

\-^ ■ explicit characterization of a threshold policy and prove its optimality using the HJB equation. 

CN ' Our simulation results show that under properly designed control policy, the existence of alternative 



access technology as a complement for a primary access network can significantly improve the user 
experience without any bandwidth over-provisioning. 
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I. Introduction 

Media streaming is fast becoming the dominant application on the Internet [1]. The popularity 
of available online content has been accompanied by the growing usage of wireless handheld 
devices as the preferred means of media access. The predictions by Cisco Visual Networking 
Index [2] show that by 2015, the various forms of video (TV, VoD, Internet Video, and P2P) will 
exceed 90 percent of global consumer traffic, and 66 percent of total mobile traffic. In order to 
cope with the demand, the wireless service providers generally build new infrastructure. Another 
approach that seems to be gaining momentum in the industry is offloading mobile data traffic 
onto another network through dual-mode such as additional WiFi interfaces [3]. This approach 
requires the wireless devices to operate seamlessly in an environment with heterogeneous access 
methods (WiFi, 3G and 4G) and different access costs (cf. Figure 1). For example, accessing 
public WiFi networks is free but unreliable, while there are additional charges associated with 
more reliable 3G or 40 data networks. The goal of this work is to design network access 
policies that minimize the access cost in such heterogeneous environments, while guaranteeing 
an acceptable level of quality of user experience. 




CBfj) 






Fig. 1. Media streaming from heterogeneous servers. 

In particular, we focus on media streaming applications which are intrinsically delay-sensitive. 
Hence, they need to be managed differently from the traditional less delay- sensitive applications 
such as Web, Email and file downloads. Most of the current approaches for providing a reasonable 
Quality of Service (QoS) for streaming applications are based on resource over-provisioning. 
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without considering the transient behavior of the service provided to such applications. In this 
work, we pay special attention to communication and control techniques that are specifically 
designed with Quality of user Experience (QoE) in mind. The goal, on one hand, is to make the 
optimal use of the limited and possibly unreliable resources to provide a seamless experience for 
the user. On the other hand, we would like to provide a tool for the service providers to improve 
their service delivery (specifically for delay-sensitive applications) in the most economical way. 
Our contributions are summarized in the following. 

We first address the problem of efficient streaming in technology-heterogeneous settings, where 
a user receives a stream over multiple paths from different servers. Each sever can be a wireless 
access point or another peer operating as a server. We consider a model that the communication 
link between the receiver and each server is unreliable, and hence, it takes a random period of 
time for each packet to arrive at the receiver from the time that the packet is requested from 
a particular server. One of the major difficulties with such multi-server systems is the packet 
tracking and duplicate packet reception problem, i.e., the receiver need to keep track of the index 
of the packets it is requesting from each server to avoid requesting duplicate packets. Since the 
requested information is delay sensitive, if a requested packet does not arrive within some time 
interval, the receiver need to request the packet from another server. This may eventually result 
in receiving duplicate packets and waste of resources. We address this issue and discuss that 
using random linear network coding (RLNC)[4] across packets within each block of the media 
file we can alleviate this issue. This technique assures us that, with high probability, no redundant 
information will be delivered to the receiver. 

We would like to emphasize that one of the critical roles of network coding techniques in 
this work, other than efficient and seamless streaming, is to simplify greatly the communication 
models, so that we can focus on end-user metrics and trade-offs. For example, if each server 
can effectively transmit packets according to an independent Poisson process, using RLNC we 
can merge these processes into one Poisson process of sum rate. Hence, the system model boils 
down to a single-server single-receiver system. 

We then study the problem of media streaming in a cost-heterogeneous environment. We 
consider a system wherein network coding is used to ensure that packet identities can be ignored, 
and packets may potentially be obtained from two classes of servers with different rates of 
transmission. The wireless channel is unreliable, and we assume that each server can deliver 
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packets according to a Poisson process with a known rate. Further, the costs of accessing the 
two servers are different; for simplicity we assume that one of the servers is free. Thus, our 
goal is to develop an algorithm that switches between the free and the costly servers in order 
to satisfy the desired user experience at the lowest cost. 

The user experience metrics that we consider in this work are the initial buffering delay 
before the media playback, and the probability of experiencing an interruption throughout media 
playback. Interruption probability captures the reliability of media playback. Such metrics best 
capture the user experience for most of media streaming applications e.g. Internet video, TV, 
Video on Demand (VoD), where the user may tolerate some initial delay, but expects a smooth 
sequential playback. In [5], we characterized the optimal trade-off between these metrics for a 
single-server single-receiver system. In particular, we established the following relation 

Probability of interruption = e-^(^)("""^' '^"'^'=™g) , (1) 

where I{R) is the interruption exponent or reliability function, which depends the arrival rate 
R of the stream. This result is analogous to information theoretic error exponent results relating 
the error probability of a code to the block length of that code. 

In a cost-heterogeneous system, the user experience such as initial waiting time may be 
improved by simultaneously accessing free and costly access methods. This adds another di- 
mension to the problem the end-user is facing. Certain levels of user satisfaction can only be 
achieved by paying a premium for extra resource availability. Figure 2 illustrates a conceptual 
three-dimensional cost-delay-reliability trade-off curve. 

The objective of this paper is to understand the trade-off between initial waiting time, and 
the usage cost for attaining a target probability of interruption, and design control policies to 
achieve the optimal trade-off curve. We study several classes of server selection policies. Using 
the QoE trade-offs for a single-server system, we obtain a lower bound on the cost of offline 
policies that do not observe the trajectory of packets received. We show that such policies have 
a threshold form in terms of the time of association with the costly server. Using the offline 
algorithm as a starting point, we develop an online algorithm with lower cost that has a threshold 
form - both free and costly servers are used until the queue length reaches a threshold, followed 
by only free server usage. We then develop an online risky algorithm in which the risk of 
interruption is spread out across the trajectory. Here, only the free server is used whenever the 
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Fig. 2. Trade-off between the achievable QoE metrics and cost of communication. 

queue length is above a certain threshold, while both servers are used when the queue length 
is below the threshold. The threshold is designed as a function of the initial buffering and 
the desired interruption probability. We numerically compare the performance of the proposed 
control policies. Our simulation results show online risky algorithm performs the best. Moreover, 
we observe that the existence of costly networks as a complement for unreliable but cheaper 
networks can significantly improve the user experience without incurring a significant cost. 

We formulate the problem of finding the optimal network association policy as a Markov 
Decision Process (MDP) with a probabilistic constraint. Similarly to the Bellman equation 
proposed by Chen [6], for a discrete-time MDP with probabilistic constraints, we write the 
Hamilton-Jacobi-Bellman (HJB) equation for our continuous-time problem by proper state space 
expansion. The HJB equation is instrumental in optimality verification of a particular control 
policy for which the expected cost is explicitly characterized. However, due to discontinuity 
of the queue-length process for the Poisson arrival model, a closed-form characterizations of 
our proposed policies are not available for the verification of the HJB equation. Therefore, 
we consider a fluid approximation model, where the arrival process is modeled as a controlled 
Brownian motion with a drift. In this case, we provide and exact and explicit characterization of a 
threshold policy that satisfies the QoE constraints. We show that the expected cost corresponding 
to this threshold policy indeed is the solution of the corresponding HJB equation, thus proving 
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the optimality of such policies. 

A. Related Work 

The set of works related to this thesis spans several distinct areas of the literature. One of the 
major difficulties in the literature is the notion of delay, which greatly varies across different 
applications and time scales at which the system is modeled. The role of delay-related metrics has 
been extensive in the literature on Network Optimization and Control. Neely [7], [8] employs 
Lyapunov optimization techniques to study the delay analysis of stochastic networks, and its 
trade-offs with other utility functions. Other related works such as [9], [10] take the flow-based 
optimization approach, also known as Network Utility Maximization (NUM), to maximize the 
delay-related utility of the users. Closer to our work is the one by Hou and Kumar [11] that 
considers per-packet delay constraints and successful delivery ratio as user experience metrics. 
Such flow-based approaches are essentially operating at the steady state of the system, and fail 
to capture the end-user experience for delay-sensitive applications of interest. 

Media streaming, particularly in the area of P2P networks, has attracted significant recent 
interest. For example, works such as [12], [13], [14], [15] develop analytical models on the 
trade-off between the steady state probability of missing a block and buffer size under different 
block selection policies. Unlike our model, they consider live streaming, e.g. video conferencing, 
with deterministic channels. However, we focus on content that is at least partially cached 
at multiple locations, and must be streamed over one or more unreliable channels. Further, 
our analysis is on transient effects — we are interested in the first time that media playback is 
interrupted as a function of the initial amount of buffering. Also related to our work is [16], which 
considers two possible wireless access methods (WiFi and UMTS) for file delivery, assuming 
particular throughput models for each access method. In contrast to this work, packet arrivals 
are stochastic in our model, and our streaming application requires hard constraints on quality 
of user experience. 

Another body of related work is the literature on constrained Markov decision processes. There 
are two main approaches to these problems. Altman [17], Piunovskiy [18], [19] and Feinberg 
and Shwartz [20] take a convex analytic approach leading to linear programs for obtaining the 
optimal policies. On the other hand, Chen [6], Chen and Blankenship [21], Piunovskiy [22] 
use the more natural and straightforward Dynamic Programming approach to characterize all 
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optimal policies. These works mainly focus on different variations of the discrete-time Markov 
decision processes. In this work, we take the dynamic programming approach for the control of 
a continuous-time Markovian process. Further, we employ stochastic calculus techniques used 
in treatment of stochastic control problems [23] to properly characterize the optimal control 
policies. 

The rest of this paper is organized as follows. In Section II, we discuss using network 
coding techniques to guarantee efficient streaming in a technology heterogeneous system. Sec- 
tion III describes the system model and QoE metrics for a media streaming scenario from 
cost-heterogeneous servers. In Section IV, we present and compare several server association 
policies. The dynamic programming approach for characterization of the optimal control policy is 
discussed in Section V. We present the fluid approximation model and establish the optimality 
of an online threshold policy in Section VI. Finally, Section VII provides a summary of the 
contributions of this paper with pointers for potential extensions in the future. 

II. Network Coding for Efficient Streaming in Technology-heterogeneous 

Systems 

In this part, we study the problem of streaming a media file from multiple servers to a 
single receiver over unreliable communication channels. Each of the servers could be a wireless 
access point, base station, another peer, or any combination of the above. Such servers may 
operate under different protocols in different ranges of the spectrum such as WiFi (IEEE 802.1 1), 
WiMAX (IEEE 802.16), HSPA, EvDo, LTE, etc. We refer to such system as a technology- 
heterogeneous multi-server system. In this setup, the receiver can request different pieces of 
the media file from different servers. Requesting packets form each server may cause delays 
due to channel uncertainty. However, requesting one packet from multiple servers introduces the 
need to keep track of packets, and the duplicate packet reception problem. In this section, we 
discuss methods that enable efficient streaming across different paths and network interfaces. 
This greatly simplifies the model when analyzing such systems. 

In order to resolve such issues of the multi-server and technology-heterogeneous systems, let 
us take a closer look at the process of media streaming across different layers. Media files are 
divided into blocks of relatively large size, each consisting of several frames. The video coding 
is such that all the frames in the block need to be available before any frames can be played. 
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Blocks are requested in sequence by the playback application from the user-end. The server (or 
other peers) packetize the requested block and transmit them to the user as in Figure 3. 
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Fig. 3. The media player (application layer) requires complete blocks. At the network layer each block is divided into packets 
and delivered. 



Now consider the scenario, illustrated in Figure 4, where there are multiple paths to reach a 
particular server. Each of these paths could pass through different network infrastructures. For 
example, in Figure 4, one of the paths is using the WiFi access point, while the other one is 
formed by the LTE (Long Term Evolution) network. 

The conventional approach in exploiting the path diversity in such scenarios is scheduling each 
packet to be delivered over one of the available paths. For instance, odd-numbered packets are 
assigned to path 1, and even-numbered packets are assigned to path 2. This approach requires 
a deterministic and reliable setting, where each path is lossless and the capacity and end-to-end 
delay of each path is known. However, the wireless medium is intrinsically unreliable and time 
varying. Moreover, flow dynamics in other parts of the network may result in congestion on a 
particular path. Therefore, the slowest or most unreliable path becomes the bottleneck. In order 
to compensate for that, the scheduler may add some redundancy by sending the same packet 
over multiple paths, which results in duplicate packet reception and loss of performance. There 
is a significant effort to use proper scheduling and control mechanisms to reduce these problems. 
For more information on this approach, generally known as MultiPath TCP (MPTCP), please 
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Fig. 4. Streaming over multiple paths/interfaces. 



refer to the works by Wischik et al [24], [25], and IETF working draft on MPTCP [26]. 
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Fig. 5. Streaming over multiple paths/interfaces using Network Coding. 

We propose random linear network coding (RLNC) to alleviate the duplicate packet reception 
problem. Figure 5 illustrates an example. Here, instead of requesting a particular packet in block 
i, the receiver simply requests a random linear combination of all the packets in block i. The 
coefficients of each combination are chosen uniformly at random from a Galois field of size 
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q. The coded packets delivered to the receiver can be thought of as linear equations, where 
the unknowns are the original packets in block i. Block i can be fully recovered by solving a 
system of linear equations if it is full rank. Note that we can embed the coding coefficients in 
the header of each coded packet so that the receiver can form the system of linear equations. 
For more implementation details, please refer to Sundararajan et al. [27]. It can be shown that if 
the field size q is large enough, the received linear equations are linearly independent with very 
high probability [4]. Therefore, for recovering a block of W packets, it is sufficient to receive 
W linearly independent coded packets from different peers. Each received coded packet is likely 
to be independent of previous ones with probability 1 — 5{q), where 5{q) — t- as g — )• oo. 

By removing the notion of unique identity assigned to each packet, network coding allows 
significant simplification of the scheduling and flow control tasks at the server. For instance, 
if one of the paths get congested or drops a few of the packets, the server may complete the 
block transfer by sending more coded packets over the other paths. Hence, the sender may 
perform TCP-like flow management and congestion control on each of the paths independently. 
Therefore, network coding provides a mean to homogenize a technology-heterogeneous system 
as if the receiver only has single interface. 

Note that such random linear coding does not introduce additional decoding delay for each 
block, since the frames in a block can only be played out when the whole block is received. So 
there is no difference in delay whether the end-user received W uncoded packets of the block 
or W independent coded packets that can then be decoded. 

In the following, we discuss the conditions under which we can convert a technology-heterogeneous 
multi-server system to a single-path single-server system. Consider a single user receiving a 
media file from various servers it is connected to. Assume that the media file is divided into 
blocks of W packets. Each server sends random linear combinations of the packets within the 
current block to the receiver. We assume that the linear combination coefficients are selected 
from a Galois field of large enough size, so that no redundant packet is delivered to the receiver. 
Moreover, we assume that the block size W is small compared to the total length of the file, but 
large enough to ignore the boundary effects of moving from one block to the next. For simplicity 
of the analysis, we assume in this work that the arrival process of packets from each server is 
a Poisson process. Since, network coding allows for independent flow control on each path, we 
may assume that the arrival process from each server is independent of other arrival processes. 
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Moreover, since we can assume no redundant packet is delivered from different peers, we can 
combine the arrival processes into one Poisson process of the sum-rate R. Thus, our simplified 
model is just a single-server-single-receiver system. 

We summarize the above discussions into the following Assumption, which is the key for 
development of the analytical results in the subsequent parts. 

Assumption 1. Consider one or more servers streaming a single media file to a single client over 
m independent path using random linear network coding. The packet delivery process over path 
k is modeled as a Poisson process of rate Rk- The effective packet delivery process observed at 
the receiver is a Poisson process of rate R = XlfcLi -Rfc- 

Note that, for simplicity of the system model and analysis, we are neglecting a few complexities 
associated with network coding approach such as the effect of field size, feedback imperfections, 
and other uncertainties 

Assumption 1 provides the necessary tool for analyzing technology-heterogeneous multi-server 
systems as a single-server system. For instance, we can apply most of the results of [5] on 
fundamental delay-interruption trade-offs. This is essential for tractability of the analysis of 
cost-heterogeneous systems, which is the focus of the subsequent part. 

in. System Model and QoE Metrics 

We consider a media streaming system as follows. A single user is receiving a media stream 
of infinite size, from various servers or access points. The receiver first buffers D packets from 
the beginning of the file, and then starts the playback at unit rate. 

We assume that time is continuous, and the arrival process of packets from each server is 
a Poisson process independent of other arrival processes. Further, we assume that each server 
sends random linear combination of the packets in the source file. Therefore, by discussions of 
Section II, no redundant packet is delivered from different servers. Therefore, we can combine 
the arrival processes of any subset of the servers into one Poisson process of rate equal to the 
sum of the rates from the corresponding servers (cf. Assumption 1). 

There are two types of servers in the system: free ^ servers and the costly ones. There is no 

'The contributions of this work still hold if both servers are costly with different access costs. Here, we normalize the access 
cost of the cheaper server to zero. 
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cost associated with receiving packets from a free server, but a unit cost is incurred per unit 
time the costly servers are used. As described above, we can combine all the free servers into 
one free server from which packets arrive according to a Poisson process of rate Rq. Similarly, 
we can merge all of the costly servers into one costly server with effective rate of Re- At any 
time t, the user has the option to use packets only from the free server or from both the free 
and the costly servers. In the latter case, the packets arrive according to a Poisson process of 
rate 

Ri = Rq -\- Rq. 

The user's action at time t is denoted by Ut E {0, 1}, where m^ = if only the free server is used 
at time t, while ut = 1 if both free and costly servers are used. We assume that the parameters 
-Ro and -Ri are known at the receiver. Figure 6 illustrates the system model. 
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Fig. 6. Streaming frow two classes of servers: costly and free. 



The dynamics of the receiver's buffer size (queue-length) Qt can be described as follows 



Qt = D + Nt 



UrdN^ - 1, 



(2) 



where D is the initial buffer size, Nt Poisson processes of rate Rq and N^ is a Poisson counter 
of rate Re which is independent of the process Nt. The last term correspond to the unit rate of 
media playback. 

The user's association (control^) policy is formally defined below. 

^Throughout the rest of this paper, we use the notion of control policy and association policy, interchangeably. 
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Definition 1. [Control Policy] Let 

ht = {Qs:0<s < t} U {us : < s < t} 

denote the history of the buffer sizes and actions up to time t, and 7i be the set of all histories 
for all t. A deterministic association policy denoted by vr is a mapping tt : 1-L i — )■ {0, 1}, where 

at any time t 

{0, if only the free server is chosen, 
1, if both servers are chosen. 
Denote by IT the set of all such control policies. 

We use the initial buffer size D, and interruption probability as QoE metrics. The interruption 
event occurs when the queue-length Qt reaches zero. However, this event not only depends on 
the initial buffer size D, but also on the control policy tt. To emphasize this dependency, we 
denote the interruption probability by 

p-p) = Pr{ro < cx)}, (3) 

where 

To = inf{t : Qi = 0}. 

Definition 2. The policy tx is defined to be {D,e)-feasible if p"{D) < e. The set of all such 
feasible policies is denoted by Il(D,e). 

The third metric that we consider in this work is the expected cost of using the costly server 
which is proportional to the expected usage time of the costly server. For any (-D, e), the usage 
cost of a (D, e) -feasible policy n is given by^ 



J^(D,e) = e[ / utdt 
'-Jo 



TO 

(4) 



The value function or optimal cost function V is defined as 

V(D,e)= min J"(£',e), (5) 

7ren(D,e) 

and the optimal policy n* is defined as the optimal solution of the minimization problem in (5). 

^Throughout this work, we use the convention that the cost of an infeasible pohcy is infinite. 
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In our model, the user expects to buffer no more than D packets and have an interruption- 
free experience with probability higher than a desired level 1 — e. Note that there are trade-offs 
among the interruption probability e, the initial buffer size D, and the usage cost. These trade-offs 
depend on the association policy as well as the system parameters Rq, R^ and F. 

Throughout the rest of this work, we study the case that i?o > 1 and the file size F goes to 
infinity, since the control policies in this case take simpler forms. Moreover, the cost of such 
control policies in this case provide an upper bound for the finite file size case. The following 
Lemma summarizes the main result from [5], characterizing the fundamental trade-off between 
the interruption probability and initial buffering, for a single-server single-receiver system. 

Lemma 1. Consider a single receiver receiving a media stream from a single server according 
to a Poisson process of rate R. Let D denote the initial buffer size before the playback, and set 
the playback rate to one. The probability of interruption in media playback is given by 

p{D) = e-^(«)^, (6) 

where I{R) is the largest root of 'y{r) = r + R{e~^ — 1). 

We first characterize the region of interest in the space of QoE metrics. In this region, a 
feasible control policy exists and is non-degenerate. We then use these results to design proper 
association policies. 

Theorem 1. Let (-D,e) be user's QoE requirement when streaming an infinite file from two 
servers. The arrival rate of the free server is given by Rq > 1, and the total arrival rate when 
using the costly server is denoted by Ri > Rq. Then 
(a) For any (D, e) such that D > jv^ log (i), 

min J''(D,e) = 0. 

vren 



(b) For any {D, e) such that D < j^^ log (^), 

min J'^fD, e) = cxo. 

TTsn 

Proof: Consider the degenerate policy ttq = 0. This policy is equivalent to a single-server 
system with arrival rate R = Rq. By Lemma 1, the policy ttq is (Z^, e) -feasible for all D > 
j7^ log (^). Note that by (4) this policy does not incur any cost, which results in part (a). 

Febraary 27. 2013 DRAFT 



15 

Moreover, for all (-D,e) with D < jt^ log Q), there is no (Z^, e) -feasible policy. This is so 
since the buffer size under any policy n is stochastically dominated by the one governed by the 
degenerate policy tti = 1. Hence, 

p"{D) > j9"i(D) = exp{-I{Ri)D) > e. 

Using the convention of infinite cost for infeasible policies, we obtain the result in part (b). ■ 



1 




Infeasible 







D 



Fig. 7. Non-degenerate, zero-cost and infeasible regions for QoE metrics {D, e). 

For simplicity of notation, let cto = I{Ro), and ai = I{Ri). By Theorem 1 we focus on the 
region 

n= {{D,e) : —\og{-) < D < —\og{-)] (7) 

to analyze the expected cost of various classes of control policies. Figure 7 illustrates a conceptual 
example of this non-degenerate region as well as the zero-cost and infeasible regions. 

IV. Design and Analysis of Association Policies 

In this part, we propose several classes of parameterized control policies. We first characterize 
the range of the parameters for which the association policy is feasible for a given initial buffer 
size D and the desired level of interruption probability e. Then, we try to choose the parameters 
such that the expected cost of the policy is minimized. All of the proofs of the main theorems 
are included in Appendix A. 
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A. Ojf-line Policy 

Consider the class of policies where the decisions are made off-line before starting the media 
streaming. In this case, the arrival process is not observable by the decision maker. Therefore, 
the user's decision space reduces to the set of deterministic functions m : M — )■ {0, 1}, that maps 
time into the action space. 

Theorem 2. Let the cost of a control policy be defined as in (4). In order to find a minimum-cost 
ojf-line policy, it is sufficient to consider policies of the form: 

{1, ift<ts 
0, ift>ts, 
which parameterized are by a single parameter tg > 0. 

Proof: In general, any off-line policy vr consists of multiple intervals in which the costly 
server is used. Consider an alternative policy vr' of the form of (8) where ts = J^ ■ By definition 
of the cost function in (4), the two policies incur the same cost. Moreover, the buffer size process 
under policy vr is stochastically dominated by the one under policy vr', because the policy vr' 
counts the arrivals from the costly server earlier, and the arrival process is stationary. Hence, 
the interruption probability of vr' is not larger than that of vr. Therefore, for any off-line policy, 
there exists another off-line policy of the form given by (8). ■ 

Theorem 3. Consider the class of off-lines policies of the form (8). For any (-D, e) G TZ, the 
policy n defined in (8) is feasible if 



t. > t: 



— logf ^—^)-D 



(9) 



-Ri — Ro 

Note that obtaining the optimal off-line policy is equivalent to finding the smallest tg for which 
the policy is still feasible. Therefore, t* given in (9) provides an upper bound on the minimum 
cost of an off-line policy. Observe that t* is almost linear in D for all {D, e) that is not too close 
to the lower boundary of region TZ. As (D, e) gets closer to the boundary, t* and the expected 
cost grows to infinity, which is in agreement with Theorem 1. In this work, we pick t* as a 
benchmark for comparison to other policies that we present next. 
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B. Online Safe Policy 

Let us now consider the class of online policies where the decision maker can observe the 
buffer size history. Inspired by the structure of the optimal off-line policies, we first focus on 
a safe control policy in which, in order to avoid interruptions, the costly server is used at the 
beginning until the buffer size reaches a certain threshold, after which the costly server is never 
used. This policy is formally defined below. 

Definition 3. The online safe policy vr"^ parameterized by the threshold value S is given by 

„ 1, if t < To 

T^'iht) ={ - (10) 

[^0, if t > r„ 

where ts = inf{t >0:Qt>S}. 

Theorem 4. Let tc'^ be the safe policy defined in Definition 3. For any {D, e) G TZ, the safe 
policy is feasible if 

S>S* = -\og( ^). (11) 



Moreover, 



imnr\D,e) = r'' {D,e) 



s>S' ' ' ' ' ' ' Ri-1 

where ^ G [0, 1). 



— log ( ^—^) -D + ^ 



Let us now compare the online safe policy n^* with the off-line policy defined in (8) with 
parameter t* as in (9). We observe that the cost of the online safe policy is almost proportional 
to that of the off-line policy, where the cost ratio of the off-line policy to that of the online safe 

policy is given by 

RojR, - 1) _ ^ I ^i(-Ro - 1) ^ 1 
-Ri — Rq Ri — Rq 

Note that the structure of both policies is the same, i.e, both policies use the costly server for a 

certain period of time and then switch back to the free server. As suggested here, the advantage 

of observing the buffer size allows the online policies to avoid excessive use of the costly server 

when there are sufficiently large number of arrivals from the free server. In the following, we 

present another class of online policies. 
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C. Online Risky Policy 

In this part, we study a class of online policies where the costly server is used only if the 
buffer size is below a certain threshold. We call such policies "risky" as the risk of interruption 
is spread out across the whole trajectory, unlike the "safe" policies. Further, we constrain risky 
policies to possess the property that the action at a particular time should only depend on the 
buffer size at that time, i.e., such policies are stationary Markov with respect to buffer size as 
the state of the system. The risky policy is formally defined below. 

Definition 4. The online risky policy vr^ parameterized by the threshold value T is given by 

f 1, if < Qt < T 
I 0, otherwise. 

Theorem 5. Let n'^ be the risky policy defined in Definition 4. For any {D,e) G 7^, the policy 
TT^ is feasible if the threshold T satisfies 



[log(f)-ao^], ifD>D, 



^ ^^og '^^t%J~' ), ifD<D, 



a\ 



where /3 = ^f^^r and D = ^ log (^). 

Theorem 5 facilitates the design of risky policies with a single-threshold structure, for any 
desired initial buffer size D and interruption probability e. For a fixed e, when D increases, 
T* (the design given by Theorem 5) decreases to zero. On the other hand, if D decreases to 
^ log (i) (the boundary of TZ), the threshold T* quickly increases to infinity, i.e., the policy 
does not switch back to the free server unless a sufficiently large number of packets is buffered. 
Figure 8 plots T* and D as a function of D for a fixed e. Observe that, for large range of D, 
T* < D, i.e., the costly server is not initially used. In this range, owing to the positive drift of 
Qt, the probability of ever using the costly server exponentially decreases in (D — T*). 

Next, we compute bounds on the expected cost of the online risky policy and compare with 
the previously proposed policies. 

Theorem 6. For any (-D, e) G TZ, consider an online risky policy vr^ defined in Definition 4, 
where the threshold T* is given by (13) as function of D and e. If D > D then 

r^\D,e)< ./ e-^oiD-T^^ (14) 

ai{Ri - 1) 
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Fig. 8. The switching threshold of the onhne risky pohcy as a function of the initial buffer size for e = 10 (See Theorem 
5). 



and if D < D 



J,, 



r {D,e)< 



1 _ e~"i^ / 6 
1 — £ ( T* + 1 + ii 

(i?i-l)(l-e-"i'^*)V ai 



D 



Ri-1' 



(15) 



where (5 = ^f^.^ and D = ^ log (^). 



In the following, we compare the expected cost of the presented policies using numerical 
methods, and illustrate that the bounds derived in Theorems 3, 4 and 6 on the expected cost 
function are close to the exact value. 

D. Performance Comparison 

Figure 9 compares the expected cost functions of the off-line, online safe and online risky 
policies as a function of the initial buffer size D, when the interruption probability is fixed to 
e = 10""^, the arrival rate from the free server is Rq = 1.05, and the arrival rate from the costly 
server is R^ = Ri — Rq = 0.15. We plot the bounds on the expected cost given by Theorems 3, 4 
and 6 as well as the expected cost function numerically computed by the Monte-Carlo method. 
Figure 9 shows that the analytical bounds we computed for the expected cost of various control 
policies closely match the exact cost functions computed via simulations. 
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Fig. 9. Expected cost (units of time) of the presented control policies as a function of the initial buffer size for interruption 
probability e — 10^'^. The analytical bounds are given by Theorems 3, 4 and 6. 



Observe that the expected cost of the risky policy is significantly smaller that both online safe 
and off-line policies. For example, the risky policy allows us to decrease the initial buffer size 
from 70 to 20 with an average of 70 x 0.15 ~ 10 extra packets from the costly server. The 
expected cost in terms of the number packets received from the costly server is 43 and 61 for 
the online safe and off-line policy, respectively. 

Moreover, note that it is merely the existence of the costly server as a backup that allows us 
to improve the user's quality of experience without actually using too many packets from the 
costly server. For example, observe that the risky policy satisfies QoE metrics of D = 35 and 
e = 10~^, by only using on average about one extra packet from the costly server. However, 
without the costly server, in order to decrease the initial buffer size from 70 to 35, the interruption 
probability has to increase from 10^'^ to about 0.03 (see Lemma 1). 

V. Dynamic Programming Approach 

In this section, we present a characterization of the optimal association policy in terms of the 
Hamilton-Jacobi-Bellman (HJB) equation. Note that because of the probabilistic constraint over 
the space of sample paths of the buffer size, the optimal policy is not necessarily Markov with 
respect to the buffer size as the state of the system. We take a similar approach as in [6] where 
by expanding the state space, a Bellman equation is provided as the optimality condition of an 
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MDP with probabilistic constraint. In particular, consider the pair {Qt,Pt) as the state variable, 
where Qt denotes the buffer size and pt represents the desired level of interruption probability 
given the information at time t. Note that pt is a Martingale by definition [28]. The evolution 
of Qt is governed by the following stochastic differential equation 

dQt = -dt + dN"", Qo = D, (16) 

where A^" is a Poisson counter with rate Rut = Rq + ut ■ Re- For any {D,e) E TZ and any 
optimal policy n, the constraint p'^{D) < e is active. Hence, we consider the sample paths of pi 
such that pq = e. Moreover, we have 'E[pt] = e for all t, where the expectation is with respect 
to the Poisson jumps. Let dpt = pt — Pt be the change in state p, if a Poisson jump occurs in 
an infinitesimal interval of length dt. Also, let dpt = dpo be the change in state p if no jump 
occurs. Therefore, 

= E[dpt] = Rutdt{pt - Pt) + (1 - Rutdt)dpQ. 

By solving the above equation for dpo, we obtain the evolution of pt as a function of the control 
process pt and Ut- 

dpt = {pt - Pt){RuA - dN''), po = e. (17) 

Similarly to the arguments of Theorem 2 of [6], by principle of optimality we can write the 
following dynamic programming equation 

V(Q,p)= min \udt + E\V(Q + dQ,p + dp)]]. (18) 

«6{0,l},35g[0,l] *- '' 

If V is continuously differentiable, by Ito's Lemma for jump processes, we have 

V{Q + dQ,p + dp)-V{Q,p) = ^{-dt) + ^-{p-p)Rudt+{V{Q + l,p)-V{Q,p))dN\ 

which implies the following HJB equation after dividing (18) by dt and taking the limit as t 
goes to zero: 

^^^^T^ = min {u + ^ . ip-p)Ru + Ru{ViQ + l,p) -ViQ,p))} 

OQ «G{0,1},35G[0,1] ^ op ^ ' ^ 

The optimal policy vr is obtained by characterizing the optimal solution of the partial differ- 
ential equation in (19) together with the boundary condition V{Q,, 1) = 0. Since such equations 
are in general difficult to solve analytically, we use the guess and check approach, where we 
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propose a candidate for the value function and verify that it nearly satisfies the HJB equation 
almost everywhere. For any {Q,p) G TZ, define 

r ^hog('-)-aoQ], i{Q>^\og('-), 
\ ^^^^^ y'^pIe-"iQ j^ otherwise, 
where = ^. The candidate solution for HJB equation (19) is given by 

when Q > ^log(^), and 

when Q < ^log(-). Note that the candidate solution is derived from the structure of the 
expected cost of the risky policy (cf. Theorem 6). We may verify that V satisfies the HJB 
equation (19) for all {Q,p) such that Q > ^ log (-) or Q < ^ log (-) - 1, but for other {Q,p) 
the HJB equation is only approximately satisfied. This is due to the discontinuity of the queue- 
length process which does not allow us to exactly match the expected cost starting from below 
the threshold with the one starting from above the threshold. Therefore, owing to approximate 
characterization of the cost of the risky policy, we may not prove or disprove optimality of this 
policy. 

In the following, we use a fluid model to provide an exact characterization of the optimal 
control policy using appropriate HJB equation. We show that the optimal policy takes a threshold 
structure similarly to the online risky policy. 

VI. Optimal Association Policy for a Fluid Model 

Thus far, we concentrated on design and analysis of various network association policies 
in an uncertain environment, where network uncertainties are modeled using a Poisson arrival 
process. We provided closed-form approximations of the cost of different policies. However, an 
exact analytical solution is required to prove optimality of the risky policy. This is particularly 
challenging, since an exact distribution of threshold over-shoots is desired, owing to the dis- 
continuous nature of the Poisson process. In this part, we exploit a second-order approximation 
of the Poisson process [29] and model the receiver's buffer size using a controlled Brownian 
motion with drift. 
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Consider the system model as in Figure 6, with following queue-length dynamics at the 
receiver: 

dQt = {Ru, - l)dt + dWt, Qo = D, (22) 

where Wt is the Wiener process, Ut E {0, 1} is the receiver's decision at time t on using the 
free or costly server. As in the preceding part, we assume that the media file size F is infinite 
and Ri> Ro> 1. 

Define the control policy (network association policy) as in Definition 1. The goal is to find a 
feasible policy that minimizes the usage cost defined in (4) such that the interruption probability 
p'^{D) defined in (3) is at most e. As in the previous part, the set of feasible policies and the value 
function is given by Definition 2 and (5), respectively. The following lemma is the counterpart 
of Lemma 1 for the fluid model. 

Lemma 2. Let iii = i denote a degenerate policy, for i G {0, 1}. The interruption probability 
for such policy is given by 

p->(D) =e-^'^, forallD>0, (23) 

where 0, = 2{Ri - I), for i G {0, 1}. 

Proof: See Appendix B. ■ 

First, we provide a characterization of the optimal policy via a Hamilton- Jacobi-Bellman 

equation. As in Section V, we expand the state variables to {Q,p), where Q is the queue-length 

with dynamics given by 22, and p is the desired interruption probability. Using Martingale 

representation theorem [28], we may write the dynamics of p as follows: 

dpt = pt dWt, po = e, (24) 

where pt is a predictable process which is adapted with respect to natural filtration of the history 
process. In this work, we focus on the control processes that are Markovian with respect to the 
state process {Qt,Pt)- Therefore, using the principal of optimality, we may write the following 
dynamic programming equation: 

V{Q,p)= min {udt + E[V{Q + dQ,p + dp)]], (25) 

(M,?3)g{0,l}xR 
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where {u,p) are the control actions. For a twice differentiable function V, we may exploit Ito's 
Lemma to get 

- ^{{Ru-l)dt + dW) + ^{pdW) 
oQ op 

+ 2 90^^^+2 9^^^^ ^'+909^^^'' 
where (a) follows from state dynamics in (22) and (24). Replacing the above equation back in 
(25), and taking the expectation with respect to dW, and limit as dt tends to zero, we obtain 
the following HJB equation 

0= inm U + — -{R^- I) + --— + -— ^{pY + ——p\. (26) 

Note that we require the following boundary conditions for the value function: 

V{Q, 1) = 1/(0, p) = 0, for all Q > 0, < p < 1 (27) 

Providing an analytical solution for the partial differential equation in (26) is often challenging. 
However, we may take a guess and check approach and use a threshold policy as the basis of our 
guess. Note that we need to verify the HJB equation for the set of state variables that reachable 
by a feasible policy. In particular, in light of Lemma 2, it is clear that for p < e~^^'^, there is 
no feasible policy and the value function V{Q,p) = oo. Moreover, for all p > e~^°^, observe 
that the degenerate policy ttq = is optimal, V{Q,p) = which also satisfies the HJB equation 
and the boundary conditions. Therefore, we focus on the non- degenerate region 

7^ = I (g, p) : Q > 0, e"^^^ <p< e"^"^ } . (28) 

Figure 7 illustrates a conceptual example of this non-degenerate region. In the following, 
we first define a threshold policy similar to the risky policy of Definition 4, and present a 
closed-form characterization of its cost function. Then, we show that, for a proper choice of the 
threshold the associated cost function satisfies the HJB equation in (26), and the optimal solution 
of the minimization problem in (26) coincides with the threshold policy. Hence, we establish 
the optimality of the proposed policy. 
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Theorem 7. Let vr^ be the threshold policy as in Definition 4, parameterized with threshold value 
T. Also, let the queue-length dynamics be governed by (22). Then, the interruption probability 
for this policy is given by 

p^(D) = I ^ ^^ ^'^^ y - - (29) 



where 



V{T) 



e-eiT 



and Oi = 2{R, - 1), for i e {0, 1}. 

Proof: See Appendix B. ■ 

Corollary 1. Let ir'^ be the threshold policy as in Definition 4. Then, the policy vr^ is {D,e)- 
feasible (cf Definition 2) for the following choices of the threshold T: 

1) For all e > e"^"^, let T = 0. 

2) For all e~^^^ < e < e~^°^, let T = T{D,e) be the unique solution ofp^{D) = e, where 
p^{D) is given by (29). 

3) For all other e, there exists no such T. 

Proof: The proof directly follows from the characterization of the interruption probability 

in Theorem 7, noting the fact that p^{D) G [e~^^^, e~^°^] is monotonically decreasing in T. ■ 

Next, we provide a exact characterization of the expected cost of the threshold policy tt^ for 

a given threshold T. This allows us to obtain a proper candidate solution for the HJB equation. 

Theorem 8. Let tc'^ be the threshold policy as in Definition 4. Define J^{D) as the expected 
cost associated with policy n'^ given the initial condition D for queue-length dynamics (22) and 
threshold T. The cost-to-go function J^{D) is given by 

e-^"(^"^)j(r), D>T 

, (^m + ^^)T5S^-^A D<T, 
where 



J^{D)=l __ _,' ._ _ -_ (30) 



^1 



J{T) = %., , ,. ,,„__,,^ . (31) 



■[l-(l + ^ir)e"^i^] 

^ -(_ n - ^)p-«i 

Proof: See Appendix B. 
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The following theorem provide a candidate for value function and verify the optimality 
condition given by HJB equation in (26). 

Theorem 9. For all {Q,p) G 7^, define 

\/(g,p) = J^(«'^)(Q), (32) 

where J^{-) is defined in (30), TZ is defined in (28), and T{Q,p) is the unique solution of 

p'{Q)=p. 
Then, the HJB equation (26) and boundary condition (27) hold for all {Q,p) G IZ. 

Proof: See Appendix B. ■ 

Theorem 9 verifies that the value function V{Q,p) given by (32) is indeed the optimal cost 
function defined in (5). Furthermore, we can conclude that the policy n*{Q,p) achieving the 
minimum in the HJB equation (26) is optimal. In general, the optimal policy depends on both 
state variables {Q,p) and is not Markov with respect to Q. In the following, we show that the 
state trajectory steered by the optimal policy is limited to a one-dimensional manifold and the 
threshold policy vr^ is optimal for all (Q,p) G TZ. Recall the policy tt^ boils down to the optimal 
policy vTo = for all other admissible states, by using threshold value T = 0. 

Theorem 10. Let 7r*(Q,p) attain the minimum in the HJB equation (26) for any {Q,p) G TZ. 
Let {QIiPD denote the state trajectory given the initial condition {D,e), under the control 
trajectory {u^,p^) = 7i*{Ql,pl). Then, the state trajectory is limited to a one-dimensional 
invariant manifold M.{D, e), where 

M{D,e) = {iQ,p):p = p^^''''\Q)}, (33) 

where T{D, e) is the solution ofp^{D) = e, and p^{-) is defined in (29). Moreover, the optimal 
policy 7r*{Q,p) coincides with the threshold policy 7r^^^'''\Q). 

Proof: See Appendix B. ■ 

Figure 10 illustrates a conceptual figure describing the intuition behind Theorem 10. Observe 

that the optimal policy satisfying the HJB equation, divides the feasible state space into two 

sub-regions corresponding to m* = and u* = 1, i.e., the policy switches costly server on/off 

when the state of the system crosses the boundary between these sub-regions. Theorem 10 states 
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that for any initial condition {Qo,Po) = (-D,e) the state trajectory lies on a one-dimensional 
manifold Ai{D,e). Figure 10 illustrates these manifolds for different initial conditions. Since 
the state trajectory is limited to a one-dimensional space, the decision of switching to the costly 
server merely depends on the queue-length process. The proof of Theorem 10 in Appendix B, 
shows that the queue-length at the switch point for each manifold Ai{D,e) coincides with the 
threshold T(D,e) specified in Corollary 1. 
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Fig. 10. Trajectoiy of the optimal policy lies on a one-dimensional manifold. 

VIL Conclusions and Future Work 

We presented a new framework for studying media streaming systems in volatile environments, 
with focus on quality of user experience. We proposed two intuitive metrics that essentially 
capture the notion of delay from the end-user's point of view. The proposed metrics in the 
context of media streaming, are initial buffering delay, and probability of interruption in media 
playback. These metrics are tractable enough to be used as a benchmark for system design. 

We first addressed the problem of streaming in a technology-heterogeneous multi-server 
system. The main challenge in multi-server systems is inefficiencies in multi-path streaming 
due to duplicate packet reception. This issue can also significantly complicates the analysis. 
We proposed random linear network coding as the solution to this challenge. By sending 
random linear combination of packets, we remove the notion of identity from packets and hence. 
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guarantee that no packet is redundant. Using this approach allows us to significantly simplify the 
flow control of multi-path streaming scenarios, and model heterogeneous multi-server systems 
as a single-server system. 

Equipped with tools provided by network coding, we added another level of complexity to 
the multi-server system. We used our framework to study multi-server systems when the access 
cost varies across different servers. Our objective was to investigate the trade-offs between 
the network usage cost and the user's QoE requirements parameterized by initial waiting time 
and allowable probability of interruption in media playback. For a Poisson arrival model, we 
analytically characterized and compared the expected cost of both off-line and online policies, 
finally showing that a threshold-based online risky policy achieves the lowest cost. The threshold 
policy uses the costly server if and only if the receiver's buffer is below a certain threshold. 
Moreover, we observed that even rare but properly timed usage of alternative access technologies 
significantly improves user experience without any bandwidth over-provisioning. 

We formulated the access cost minimization problem as a Markov decision problem with 
probabilistic constraints, and characterized the optimal policy by the HJB equation. For a fluid 
approximation model, we established the optimality of a threshold-based online policy in the 
class of deterministic Markov policies using the HJB equation as a verification method. 

The framework that we have developed in this work can also be used to design adaptive 
resolution streaming systems that not only depend on the channel conditions but also on the 
delay requirement of the application, which is captured by the queue-length at the receiver. 
As for other extensions of this work, we would like to study more accurate models of channel 
variations such as the two-state Markov model due to Gillbert and Elliot. In this work we focused 
on deterministic network association policies. Another extension of this work would consist of 
studying randomized control policies. 

Appendix A 
Analysis of the Control Policies for the Poisson Arrival Model 

Proof of Theorem 3. By Definition 2, we need to show that p^[D) < e. By a union bound on 
the interruption probability, it is sufficient to verify 

Prf min Qt < 0\Qo = d) + PrfminQt < OlQo = d) < e. (34) 

V o<t<ts ' / V t>ts ' / 
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In the interval [0, ts], Qt behaves as in a single-server system with rate Ri. Hence, by Lemma 
1 we get 

Prf min Qt < 0\Qo = d) < e~"^^. (35) 

\0<t<ts ' / 

For the second term in (34), we have 



Pr(mingi<0|Qo = ^) = Y^ Pr( min Qt < 0\Qt^ = q)Pr{Qt^ = q) 

q=D-ts 

, \ oo 

q=D~ts 

oo 

= J2 e"°°^^+''"*=^Pr(iVt, + n;;^ = k) 

oo 

^" k\ 

oo 

^ k\ 



k=0 



°o p-^i*=e-"0/D^;^ p-ao\k 



k=0 

^^ exp(-ao(/^-t.) + i?its(-^)) 

id) ^ 

where (a) follows from Lemma 1 and the fact that tit = 0, for t > tg. (b) is true because 
Nt^ + Njr is a Poisson random variable with mean Ritg. (c) holds since Oq = -^(-^o) is the root 
of 7(r) = r + Ro{e~'^ — 1). Finally, (d) follows from the hypothesis of the theorem. 

By combining the above bounds, we may verify (34) which in turns proves feasibility of the 
proposed control policy. ■ 

Proof of Theorem 4. Similarly to the proof of Theorem 3, we need to show that the total 
probability of interruption before and after crossing the threshold S is bounded from above by 
e. Observe that for any realization of ts the bound in (35) still holds. Further, since the costly 
server is not used after crossing the threshold and Qr^ > S, Lemma 1 implies 

Prf minQt < Olgo = ^"l < e"""^ < e - e"°^^, (36) 

V t>TS ' / 
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S 

where the second inequality follows from (11). Finally, combining (35) and (36) gives p'^ (D) < 
€, which is the desired feasibility result. 

For the second part, first observe that J'^' (D, e) = E[r5']. In order to cross a threshold S > S*, 
the threshold S* must be crossed earlier, because Qo = D < S*. Hence, ts stochastically 
dominates r^, implying 

r\D,e) = E[ts] > B[ts*] = r'\D,e), for all S > S*. 

It only remains to compute E[r5.]. It follows from Wald's identity or Doob's optional stopping 
theorem [28] that 

D + {Ri- l)E[rs^] = E[Qr,,] = S* + ^, (37) 

where ^ E [0, 1) because the jumps of a Poisson process are of units size, and hence the 
overshoot size when crossing a threshold is bounded by one, i.e., S* < Qr^, < S* + 1. 
Rearranging the terms in (37) and plugging the value of S* from (11) immediately gives the 
result. ■ 

Lemma 3. Let Qt be the buffer size of a single-server system with arrival rate R > 1. Let the 
initial buffer size be D and for any T > D > define the following stopping times 

Tt = inf{t >0:Qt>T}, t^ = inf{t > : Q* < 0}. (38) 

Then 

1 _ e-n^)i) 
Pr(re > tt) = ^. _,.^.Q , -, (39) 

where I{R) is defined in Lemma L 

Proof: Let Y(t) = e~^^^^^K We may verify that Y(t) is a Martingale and uniformly 
integrable. Also, define the stopping time r = min{rT,re}. Since i? > 1, we have Pr(r > 
t) < Pr(0 < Qt < T) — ;■ 0, as t — 7- oo. Hence, r < oo almost surely. Therefore, we can employ 
Doob's optional stopping theorem [28] to write 

^-HB)D ^ E[F(0)] = E[F(r)] 

= Pr(re<rT)-l 
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The claim immediately follows from the above relation after rearranging the terms. 

■ 
Proof of Theorem 5. Let us first characterize the interruption probability of the policy vr^ 
when the initial buffer size is Z^ = T. In this case, by definition of tt^ the behavior of Qt is 
initially the same as a single-server system with rate Ri until the threshold T is crossed. Hence, 

P^\t) = Pr(^mmQt<0\Qo = T^ 

= Pr(re<rr)-1 

+Pr(rT < re)Pr(mmQt < OWt < r^^Qo = t) 

\ t>TT / 

(1 - e-°i^)Prf minj>,^ Qt < 0\tt < r„Qo = t) 
H ^^ (40) 

where the last equality follows directly from Lemma 3. Further, we have 

pT+1 

Pr 



(mmQt<0\TT< T,,Qo = t) = [ Prf minQt < 0|g^^)o?/x(Q,^) 

V t>TT / Jrp \ t>TT J 

f PT(mmQt<0\Qo)dfi{Qo) 

/ Prfmingt < 0| minQt < T,Qo)Pr{mmQt < T\Qo)dfi(Qo) 



(a) 



/■T+1 
(6) 





-"o{Qo 


-^)rf/i(go 


= E[e" 


-OLoiQ^ 


■r-T) 


Tt < ' 


reV"{T), 



(41) 

where /i denotes the conditional distribution of Q^^ given tt < t^.. Note that Q^j, G \r,T + 1] 

because the size of the overshoot is bounded by one. Further, (a) follows from stationarity of 

the arrival processes and the control policy, (b) holds because a necessary condition for the 

interruption event is to cross the threshold T when starting from a point Q^ > T. Finally (c) 

follows from Lemma 1 and the definition of the risky policy. The relations (40) and (41) together 

result in 

rf T) = 5^ - — (42) 

P ^'> 1 - E^[e""«(«^T"^)] + «: ' ^^^^ 

where k = E^[e-°o'3-T-("i-<^(')^] - E^fe^^^'^^T] > 0. Therefore, using the fact that 

1 - X < e""' < 1 - X + — , for all X > 0, (43) 
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we can provide the following bound 



aoE^[Q,^-T](l- 


_ £0 , 
2 




"1 „-«iT 






«o(l - ^) 




/3e--^, 







ao(l-f) 

(44) 

where the last inequality holds, since < Qr^ — D < 1. 

Now we prove feasibility of the risky policy tt^' when D > D. Observe that by (13), D > T*, 
hence the behavior of the buffer size Qt is the same as the one in a single-server system with 
rate Rq until the threshold T* is crossed. Thus 

p"^*(Z}) = Pr(minQt<0|Qo = ^) 

= Pr(^mmQt < 0| mingt < T*,Qo = D'^Pr{mmQt < T*\Qo = D) 

where the inequality follows from (44), and the last equality holds by (13). 

Next we verify the feasibility of the policy vr^* fox D < D. In this case, D < T* and by 
definition of the risky policy the system behaves as a single-server system with arrival rate Ri 
until the threshold T* is crossed or the buffer size hits zero (interruption). Hence, we can bound 
the interruption probability as follows 

p"^* (D) = Pr(re < TT.) ■ 1 + Pr(rT. < re)Prf min Qt < OWt* < r,, Qo = D 



(a) 



1 - Pr(rr. < r,) (l - E^[e""°(«^T. -^*)]p-"* (T*) 



-aiD 



(c) 1 _ p-aiD , 
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where (a) follows from (41), (b) is a direct consequence of Lemma 3, (c) is a result of (44), (d) 
may be verified by noting that oq = I{Ro), ol\ = I{Ri) and Ri > Rq, (e) holds since (3 > I and 
Qtj-* ^ T*. Finally, (f) immediately follows from plugging in the definition of T* from (13). 

Therefore, the risky policy vr^* is feasible by Definition 2. Observe that the buffer size under 
any policy vr^ of the form (12) with T > T* stochastically dominates that of policy vr^*, because 
n^ switches to the costly server earlier, and stays in that state longer. Hence, tt^ is feasible for 
all T>T*. m 

Proof of Theorem 6. Similarly to the proof of Theorem 5, we first consider the risky policy vr^ 
with the initial buffer size T. By definition of vr^, the costly server is used until the threshold T 
is crossed. Thus the expected cost of this policy is bounded by the expected time until crossing 
the threshold plus the expected cost given that the threshold is crossed, i.e., 

Ri — 1 
where r^ is defined in (38). The above relation implies 

1 E[Q^^-T] 



r (T,e) < 



< 



Ri-l 1 - E[e-"«(Q-T-^)] 

1 E[g,, - T] 



i?i-l i_E[l-ao(Q.,-T) + ^(g.,-T)2] 
1 1 



A _ ^ E,(Q.^-T)^] \ 
"0\^-^ 2 ■ E[Q,y-T] J 



< I = ^ (45) 

- a„(fii - 1)(1 - f ) ai(fii-l)' 

where the second inequality follows from the fact in (43), and the last equality holds by definition 
of (3. Now for any D > D we can write 

.r^'{D,e) = Pr(^mmQt<T*\Qo = Dyr^'iT*,e) 

= e-'^«(^-^*)j"^*(T*,e) 

where the inequality holds by Lemma 1. Combining this with (45) gives the result in (14). 

If D < D, the risky policy uses the costly server until the threshold T* is crossed at r^. or 
the interruption event (rg), whichever happens first. Afterwards, no extra cost is incurred if an 



February 27. 2013 DRAFT 



34 



/3 



interruption has occurred. Otherwise, by (45) an extra cost of at most ^ (r -u is incurred, i.e., 

r"\D,e) < E[min{re,rT.}] +Pr(rT* < r,) J^ 

ai{Ri - 1) 

By Doob's optional stopping theorem applied to the Martingale Zt = Qt — (Ri — 1)^, we 

obtain 



D = Pr(rT. < Te)E[Qr^,\TT^ < r,] - (i?i - 1)E [minJTe, tt.}], 

which implies 

E[mm{re,rT.}J < ^ j^^^ . 

By combining the preceding relations we conclude that 

j.-(B.,,<5!%Uii£)fr- + l' "^ D 



Rl — 1 V tti/ i?i — 1' 

which immediately implies (15) by employing Lemma 3. 



Appendix B 
Analysis of the Threshold Policy for the Fluid Approximation Model 

Proof of Lemma 2. There are multiple approaches to prove the claim. We prove a more general 
case using Doob's optional stopping theorem that will be useful in the later arguments. We only 
consider i = 0; the other case is the same. 

Let Yt = e"^"'^'. It is straightforward to show that Yf is a Martingale with respect to Wt. Now 
consider the boundary crossing problem, where we are interested in the probability of hitting 
zero before a boundary b > D. Let r denote the hitting time of either boundaries. For any n > 0, 
we may apply Doob's optional stopping theorem [28] to the stopped Martingale F^-An to write 

B[YrAb] = E[e^''°^^^"] = e"^"^, for all n. 

Now, we take the limit as n — )• oo and exploit the dominant convergence theorem to establish: 

E[Yr] = E[e~''°^^] = lim E[e~^"Q-^"] = e"^"^. (46) 
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Finally, using the Borel-Cantelli Lemma, we can show r is finite with probability one, which 
allows us to decompose (46) and characterize the boundary crossing probabilities as 

Pr(Q, = 0) ■ 1 + Pr(g, = 6) ■ e-^«'' = e"^"^, 

Pr(g, = 0) + Pr(g, = 6) = 1. 

Solving the above equations gives 

p-eoD _ -Gob 
P-iQr = 0) = i_^,,,. . (47) 

1 _ -BoD 

Pr{Qr = b) = -^-^^. (48) 

Taking the limit as 6 — )■ oo proves the claim. ■ 

Proof of Theorem 7. We first characterize the interruption probability for the cases D > T 
and D < T given p(T), which is the interruption probability starting from D = T. 
For any a; > 0, define r^ as the first hitting time of boundary x, i.e., 

T, = mi{t >0:Qt=x}. (49) 

For the case D > T, using path-continuity of Qt, strong Markov property and Lemma 2, we 
have 

p{D) = Pr{To < oo\Qo = D) 

= Pr{To<oo\Qo = T)-Pr{TT<oo\Qo = D) 

= e-''«(^-^V(T). 

For the case D < T, we use the boundary crossing probabilities (47) and (48) that we derived 
in the proof of Lemma 2. Note that for the threshold policy when D < T, the drift is set to 
Ri — 1 = 2Q\. Hence, by total probability theorem and strong Markov property, we obtain 

V{B) = Pr{To<oo\Qo = D) 

= 1 ■ Pr(ro < TtIQo = D) + Pr(ro < oo|go = T) ■ Pr(rr < to\Qo = D) 

+ p{T)- 



1 - e-^i^ ' ' 1 - e-^i'' ■ 

We may obtain the desired result after simple manipulations of the above relation, once we 
compute p{T). 
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In order to characterize p{T), we use an analogue of one-step deviation analysis for Markov 
chains. Let Qq = T, and consider a small deviation Q^, where h is a small time-step. Since 
Qt is a Brownian motion with drift, Qh has a normal distribution with variance h, and mean of 
T + ah, where a E [Rq — 1, -Ri — 1]. Therefore, the probability of Q^ > T is, (| + 5) + o{h), 
where 5 is a small constant of the same order of h, and -j^ — )• as /i — t- 0. By strong Markov 
property of the Brownian motion, (47) and (48), we have 

p{T) = Pr{ro < oo\Qo = T) 

= Pr(ro < oo\Qh >T){^ + S) + Pr(ro < oo\Qh <T){^-6) + o{h) 

+ p(r)EQjPr(rr < oo)|g, > T]] (^ + 6) 



+ 



1 ■ EQjPr(ro < rT)\Qh < T] +p(r)EQjPr(rT < ro)\Qh < T] 



S) 



+o{h) 



+ p(r)E[e-'«^|Z>0]J(- + 5) 



,1 



+E 



g-^iT/g-eiZ _ l^ 



+p(r)E 



1 - 



iT 



e "i^fe" 



z <0 

9iZ _ I 



S) 



-6*1 T 



Z<0\{--6) + o{h), 



(50) 



where Z = Qh — T is a Gaussian random variable with mean ah and variance h. In order to 
obtain p{T), we need to compute E[e~^°^\Z > 0] and E[e^^'^^\Z < 0]. We may compute these 
expressions exactly, but it is simpler to compute upper and lower bounds and then take the limit 

as /i — )• 0. By (43), we have 



E[e^^o^|Z>0] < E[l-eoZ 



%Zf 



Z>0] = l-^o/3v^ + o(v^), 



(51) 



E[e-^°^\Z>Q\ > E[l-9oZ\Z>0] = l-9o(3Vh + o{Vh), 
where /3 is a constant. Similarly, we get 

1 + Oi^Vh + o{Vh) <E[e-^'^\Z <0] < l + OiPy/h + oiVh). 
Plugging these relations back in (50), dividing by pVh and taking the limit as h goes to zero 



we obtain the following equation 

p(T)k 



-eiT 



1 - e-^i^ 



01 



-SiT 



1 - e-^i- 



-di, 



(52) 
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which gives the desired result for p{T) after rearranging the terms. ■ 

Proof of Theorem 8. The proof technique for this theorem is analogous to that of Theorem 7. 
First, we consider the cases D > T and D < T and characterize the expected cost in terms of 
J(T). Let T^ be defined as in (49). 
For the case D > T, note that no cost is incurred until the threshold T is reached. Hence 



AD) 



E 
E 



L"'0 



Utdt 



Utdt 



tt = oo,Qo = D 



E 



Utdt 



Uo 



tt < oo,Qo = D 



Pr(rr = oo|Qo = D) 
Pr(rT < oo|go = D) 



(a) 



+ E 



E 



Ufdt 



TT 



Tt < oo,Qo = D 



Pr{TT < oo\Qo = D) 



Utdt 



IJG 



Qo = T 



Pr{TT < oo\Qo = D) 



(b) 



J{T)Pr{TT < oo\Qo = D) W j(T)e 



^eo{D~T) 



where (a) follows from the memoryless property of Brownian motion and (6) is a consequence 
of Lemma 2. 

For the case D < T, we can use a strong Markov property to write the following for a small 
time-step h: 



J{D) = J{Qo) = l-h + Ew[J{Qh)] 

dJ 



/i + E 



w 



JiD) + ^m-i)h+w,) + \-^^-h 



+ oih) 



81 \ B'^ J 

h + j{D) + (^1 - 1)^ ■ ^ + 2 ■ ao^ ■ ^ + ^^^)' 



which gives the following ordinary differential equation after dividing by h and taking the limit 

as /i -)■ 

<92.7 a.i 

(53) 



(9^ 7 BJ 



It is straightforward to solve the differential equation in (53) with the boundary condition 
J(0) = 0, and J{T) as a parameter. This result completes the characterization of J{D) described 
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in (30) as a function of J{T). Also, note that if we set the boundary condition J{T) = 0, J{D) 
gives the expected time to hit either of the boundaries at or T, i.e., we get 



E 



min{ro,rT}|Qo = D 



Oi 



T 



1 



tD 



1 



D 



(54) 



Now, we use a similar technique as in the proof of Theorem 7 to compute J(T). Consider a 
small time step deviation h > from the initial condition Qq = T. Similarly to (50), we have 

J{T) = -fh+ [j(r)EQjPr(rr < oo)|g,, > T]\Pr{Qh > T\Qo = T) 



+ 



1-E 



mm{To,TT}\Qh < T +0 ■ EQjPr(ro < TT)\Qh < T] 



+J(T)EQjPr(rT<ro)|g,<T] 



Pr(g/, <T|go = T) + o(/i) 



+E 
+J(T)E 



mm{To,TT}\Qh<T {^-^) 



e-0iTu-9^z _ l^ 



^T 



z <0 



6) + o{h), 



where 7 is a constant bounded hy 1, S = Q{h), and Z = Qh — T is a Gaussian random variable 
with mean ah and variance h for some constant a. The second inequality in the preceding 
relations follows from (48) and Lemma 2. By (54), applying the bounds in (51) and (52), 
dividing by fS^/h and taking the limit at /i — t- 0, we obtain the following equation 



AT) 



01- 



+ O0 



2 



i-e^T 



1 - e-'^i 



T 



which gives us the desired expression in (31) after rearranging the terms. 



Proof of Theorem 9. In order to facilitate verification of the HJB equation (26), we rewrite 
and slightly manipulate the candidate solution V{Q,p) given by (32). Recall that 

Vo{Q,p), Q>T{Q,p) 
Vi{Q,p), Q<T{Q,p), 



where 



ViQ,p) 



K)(g,p) = e-^°(«-^('3,rt)j(2^(g^p))^ 



Vi(g,p) = [J{T{Q,p)) + lnQ^p)] ^\je~nQ,,) e. 



^Q^ 



(55) 



(56) 
(57) 
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JinQ,p))= - , ^^.^-.,m.) • (58) 



Note that p'^^'^'^^Q) = p; and by definition of p'^i^-) in (29) we may verify that the condition 
IZ into two sub-regions IZq and IZi, such that 



Q ^ T{Q,p) is equivalent to p ^ ^^^ ^ ^^"^ g^ ^ _g^ ^ . Therefore, we can partition the feasible region 



Hence, we need to verify HJB for two regions separately, using the proper expression in (55). 
In order to verify the HJB equation for the candidate solution (32), we also need to characterize 
the optimal value of the minimization problem in (26). First, we characterize the optimal solution 
pair {u*,p*) for any feasible state {Q,p) E IZ. Observe that the optimization problem in (26) 
can be decomposed into two smaller problems: 

f dy 1 

u*{Q,p) = argmm„g|oij<^n+— (i?„-l) y, (59) 

nQ,p) = argmin,|-^(p)^ + ^^p|. (60) 

The minimization problem in (60) is quadratic and hence convex in p. So we can use first 
order optimality condition to get 

For the problem in (59), u*{Q,p) = is and only if 

dV dV 

or equivalently 

Using the chain rule and the implicit function theorem, we can analytically calculate |^ from 
(55) to conclude that the condition in (62) holds if and only if {Q,p) E IZq. In other words, 
u*{Q,p) = for all {Q,p) E 7^o and u*{Q,p) = 1 for all {Q,p) E 7^l. 
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In summary, the HJB equation in (26) boils down to the following equations: 



00 dVo 1 92\/o 1 / 9Vo \ 2 S^V^c 



+ 



)'■ 







2 dQ 2dQ^ 2\dQdpJ ' dp^ ' 



for all (g,p) G 7^c 






■,)i- 



for all (g,p) e7^l, 



(63) 
(64) 



2 dQ 2 9g2 2\dQdpJ ' dp'^ 
where Vo{Q,p) and Vi(g,p) are given by (56) and (57), respectively. The verification of (63) 
and (64) is straightforward but tedious. We omit the details for brevity. We may simply use 
symbolic analysis tools such as Mathematica for this part. ■ 



Proof of Theorem 10. From the proof of Theorem 9, we have characterized the optimal policy 

TT* :Rx [0,1] -^ {0,1} X M as 



n*iQ,p) = (u*{Q,p),p*iQ,p)], 



where u*{Q,p) = if and only if {Q,p) E TZq and p*{Q,p) given by (61) can be explicitly 
computed as follows: 



P*{Q,P) 
P*{Q,P) 



Oop, for all {Q,p) G 7^o, 
'l-pyiQ 1 



pe 
01 



0iQ\2 



p^Ui 



Q 



I 



p){e 



26*1 



— e 



2{e' 



^i(l-pe^iQ)2 ^i(l-pe^iQ) 

for all (g,p) G7^l. 



(65) 



(l+p)e^i«-2' 
Moreover, the dynamics of the state process under the optimal control policy is given by: 

dp; = p*iQip:)dWt. 

For the proof of the first claim, observe that for a given manifold Ai{D,e), we have 

TiQ,p) = TiD,e), forall (g,p)G7W(Z^,e). (66) 

This claim holds by definition of A^ (-D, e) in (33), and the fact that T{Q, p) is the unique solution 
of p^{Q) = p. Next, we show that if {Ql,pl) G M.{D,e) for any t > 0, then after executing the 
optimal policy vr*, the state process stays on the manifold Ai{D, e). First, consider the case where 
Q*t > T{Q*t,p*t) = T{D, e). In this case, u*{Ql,pl) = and p*{Qlp*t) = -9opl We would like 
to show that the solution of the stochastic differential equation dpi = —OopldWt coincides with 



the invariant manifold given by pt = e 



-eo{Q*~T(D,e 



^^p{T{D,e)). By employing Ito's Lemma we 
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can check that e ^"'^'^* '^^^'''^^p(T(D,e)) is indeed the desired solution. In particular, using the 
proper evolution of the queue-length process Q^, we can write 



dpt 



lQ2^-0o(Q;-TiD,e))p^j.^j^^^^^^^ 



-e<,{Ql-T{D,e)) 



p{T{D,e)) 



Ooijdt + dWt) 



-01 
2 °. 



= -pldWt. 

Next, consider the case where Ql < T{Q^,pl) = T{D,e). In this case, u*{Q*,p*) = 1 and 
p*{Ql,Pt) is given by (65). Similarly to the previous case, we may use Ito's Lemma to verify 
that the state process stays on the invariant manifold given by 

^0. 



Pt = p^^^'^Kq; 

By Ito's Lemma we have 

dpt - 



n^t 



p{T{D,e))il 



01 



1 - e-"'"^^ 



dp^^'^'^iQii^Q. + 1 . d'p^'^^-im ^^ 



dQ 



7ivt 



2 ^ 
-p*tdWt : 



p(T(Ae))(l 



0i' 



p(T(D,e))(l 



A 



01 i 



dt + dWt 



dt 



dp 



f 5 



which completes the proof of the first claim. 

Now that we have established that the state process starting from {D, e) under optimal control 
stays on a one-dimensional invariant manifold Ai{D,e), the optimality of the threshold policy 
■k^^^'''\Q) is immediate. Recall that the decision process of importance is u^ E {0, 1}, and we 
know that u*{Q,p) = if and only if Q > T(Q,p). Moreover, since the optimal state process 
stays on M{D,e), we have T{Q*,pl) = T{D,e). Hence, the optimal control policy (given the 
initial condition) chooses the action u*{Q,p) = if and only if Q > T(D,e). Therefore, the 
optimal policy 7r*{Q,p) coincides with the threshold policy 7r^^^'''\Q). We may also verify that 
the interruption probability under the threshold policy conditioned on the history up to time t is 
given hy p^. ■ 
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